Row Buffer Locality-Aware Data Placement in Hybrid Memories

نویسندگان

  • HanBin Yoon
  • Justin Meza
  • Rachata Ausavarungnirun
  • Rachael Harding
  • Onur Mutlu
چکیده

Phase change memory (PCM) is a promising alternative to DRAM, though its high latency and energy costs prohibit its adoption as a drop-in DRAM replacement. Hybrid memory systems comprising DRAM and PCM attempt to achieve the low access latencies of DRAM at the large capacities of PCM. However, known solutions neglect to assess the utility of data placed in DRAM, and hence fail to achieve high performance and energy efficiency. We propose a new DRAM-PCM hybrid memory system that exploits row buffer locality. The main idea is to place data that cause frequent row buffer miss accesses in DRAM, and data that do not in PCM. The key insight behind this approach is that data which generally hit in the row buffer can take advantage of the large memory capacity that PCM has to offer, and still be accessed as quickly as if the data were placed in DRAM. We observe our mechanism (1) effectively mitigates the high access latencies and energy costs of PCM, (2) reduces memory channel bandwidth consumption due to the migration of data between DRAM and PCM, and (3) prevents data that exhibit low reuse from polluting DRAM. We evaluate our row buffer locality-aware scheme and show that it outperforms previously proposed hybrid memory systems over a wide range of multiprogrammed workloads. Across 500 workloads on a 16-core system with 256 MB of DRAM, we find that our scheme improves system performance by 41% over using DRAM as a conventional cache to PCM, while reducing maximum slowdown by 32%. Furthermore, our scheme shows 17% performance gain over a competitive all-PCM memory system, and comes to within 21% of the performance of an unlimited-size all-DRAM memory system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories

Phase change memory (PCM) is a promising memory technology that can offer higher memory capacity than DRAM. Unfortunately, PCM’s access latencies and energies are higher than DRAM and its endurance is lower. DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latencies and energies, and high endurance of DRAM, while taking advantage of the large PCM capacity. A ...

متن کامل

Evaluating Row Buffer Locality in Future Non-Volatile Main Memories

DRAM-based main memories have read operations that destroy the read data, and as a result, must buffer large amounts of data on each array access to keep chip costs low. Unfortunately, system-level trends such as increased memory contention in multi-core architectures and data mapping schemes that improve memory parallelism may cause only a small amount of the buffered data to be accessed. This...

متن کامل

Managing Hybrid Main Memories with a Page-Utility Driven Performance Model

Hybrid memory systems comprised of dynamic random access memory (DRAM) and non-volatile memory (NVM) have been proposed to exploit both the capacity advantage of NVM and the latency and dynamic energy advantages of DRAM. An important problem for such systems is how to place data between DRAM and NVM to improve system performance. In this paper, we devise the first mechanism, called UBM (page Ut...

متن کامل

DRAM-Aware Last-Level Cache Replacement

The cost of last-level cache misses and evictions depend significantly on three major performance-related characteristics of DRAM-based main memory systems: bank-level parallelism, row buffer locality, and write-caused interference. Bank-level parallelism and row buffer locality introduce different latency costs for the processor to service misses: parallel or serial, fast or slow. Write-caused...

متن کامل

Application-aware Adaptive DRAM Bank Partitioning in CMP

Main memory is a shared resource among cores in a chip and the speed gap between cores and main memory limits the total system performance. Thus, main memory should be effectively accessed by each core. Exploiting both parallelism and locality of main memory is the key to realize the efficient memory access. The parallelism between memory banks can hide the latency by pipelining memory accesses...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011